55 research outputs found

    Differential Privacy - A Balancing Act

    Get PDF
    Data privacy is an ever important aspect of data analyses. Historically, a plethora of privacy techniques have been introduced to protect data, but few have stood the test of time. From investigating the overlap between big data research, and security and privacy research, I have found that differential privacy presents itself as a promising defender of data privacy.Differential privacy is a rigorous, mathematical notion of privacy. Nevertheless, privacy comes at a cost. In order to achieve differential privacy, we need to introduce some form of inaccuracy (i.e. error) to our analyses. Hence, practitioners need to engage in a balancing act between accuracy and privacy when adopting differential privacy. As a consequence, understanding this accuracy/privacy trade-off is vital to being able to use differential privacy in real data analyses.In this thesis, I aim to bridge the gap between differential privacy in theory, and differential privacy in practice. Most notably, I aim to convey a better understanding of the accuracy/privacy trade-off, by 1) implementing tools to tweak accuracy/privacy in a real use case, 2) presenting a methodology for empirically predicting error, and 3) systematizing and analyzing known accuracy improvement techniques for differentially private algorithms. Additionally, I also put differential privacy into context by investigating how it can be applied in the automotive domain. Using the automotive domain as an example, I introduce the main challenges that constitutes the balancing act, and provide advice for moving forward

    Efficient Error Prediction for Differentially Private Algorithms

    Get PDF
    Differential privacy is a strong mathematical notion of privacy. Still, a prominent challenge when using differential privacy in real data collection is understanding and counteracting the accuracy loss that differential privacy imposes. As such, the accuracy/privacy trade-off of differential privacy needs to be balanced on a case-by-case basis. Applications in the literature tend to focus solely on analytical accuracy bounds, not include data in error prediction, or use arbitrary settings to measure error empirically.To fill the gap in the literature, we propose a novel application of factor experiments to create data aware error predictions. Basically, factor experiments provide a systematic approach to conducting empirical experiments. To demonstrate our methodology in action, we conduct a case study where error is dependent on arbitrarily complex tree structures. We first construct a tool to simulate poll data. Next, we use our simulated data to construct a least squares model to predict error. Last, we show how to validate the model. Consequently, our contribution is a method for constructing error prediction models that are data aware

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

    Introducing Differential Privacy to the Automotive Domain: Opportunities and Challenges

    Get PDF
    Privacy research is attracting increasingly more attention, especially with the upcoming general data protection regulation (GDPR) which will impose stricter rules on storing and managing personally identifiable information (PII) in Europe. For vehicle manufacturers, gathering data from connected vehicles presents new analytic opportunities, but if the data also contains PII, the data comes at a higher price when it must either be properly de-identified or gathered with contracted consent from the drivers. One option is to establish contracts with every driver, but the more tempting alternative is to simply de-identify data before it is gathered, to avoid handling PII altogether. However, several real-world examples have previously shown cases where re-identification of supposedly anonymized data was possible, and it has also been pointed out that PII has no technical meaning. Additionally, in some cases the manufacturer might want to release statistics either publicly or to an original equipment manufacturer (OEM). Given the challenges with properly de-identifying data, structured methods for performing de-identification should be used, rather than arbitrary removal of attributes believed to be sensitive. A promising research area to help mitigate the re-identification problem is differential privacy, a privacy model that unlike most privacy models gives mathematically rigorous privacy guarantees. Although the research interest is large, the amount of real-world implementations is still small, since understanding differential privacy and being able to implement it correctly is not trivial. Therefore, in this position paper, we set out to answer the questions of how and when to use differential privacy in the automotive industry, in order to bridge the gap between theory and practice. Furthermore, we elaborate on the challenges of using differential privacy in the automotive industry, and conclude with our recommendations for moving forward

    SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

    Get PDF
    Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

    Security and Privacy for Big Data: A Systematic Literature Review

    Get PDF
    Abstract-Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

    PLAN: Variance-Aware Private Mean Estimation

    Full text link
    Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present Private Limit Adapted Noise\textit{Private Limit Adapted Noise} (PLAN), a family of differentially private algorithms for mean estimation in the setting where inputs are independently sampled from a distribution D\mathcal{D} over Rd\mathbf{R}^d, with coordinate-wise standard deviations σ∈Rd\boldsymbol{\sigma} \in \mathbf{R}^d. Similar to mean estimation under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of the data, but unlike previous algorithms the privacy budget is spent non-uniformly over the coordinates. Under a concentration assumption on D\mathcal{D}, we show how to exploit skew in the vector σ\boldsymbol{\sigma}, obtaining a (zero-concentrated) differentially private mean estimate with ℓ2\ell_2 error proportional to ∥σ∥1\|\boldsymbol{\sigma}\|_1. Previous work has either not taken σ\boldsymbol{\sigma} into account, or measured error in Mahalanobis distance \unicode{x2013} in both cases resulting in ℓ2\ell_2 error proportional to d∥σ∥2\sqrt{d}\|\boldsymbol{\sigma}\|_2, which can be up to a factor d\sqrt{d} larger. To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data

    SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

    Get PDF
    Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations. Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

    Joint Subjective and Objective Data Capture and Analytics for Automotive Applications

    Get PDF
    In this paper we describe a novel technological framework for capture and analysis of both objective measurement data and subjective user experience data for automotive applications. We also investigate how the framework can be extended to address privacy issues by enforcing a rigorous privacy model called differential privacy. The system under development integrates a telematics system with a smartphone app service architecture and a data-driven analytics framework. The hypothesis is that the framework will improve the opportunities of conducting large scale user trials of automotive functions and services, while improving the quality of collected data. To achieve this, a number of challenges are addressed in the paper, including how to design the subjective data capture mechanisms to be both simple to use yet powerful, how to correlate subjective data with objective measurement data, and how to protect the privacy of users
    • …
    corecore